{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "<a id='HOME'></a>\n",
    "# CHAPTER 8 Data Has to Go Somewhere\n",
    "## 資料的歸宿\n",
    "\n",
    "* [8.1 資料的輸出與輸入](#IO)\n",
    "* [8.2 結構化資料檔案](#StructuredText)\n",
    "* [8.3 結構化二進位檔案](#StructuredBinary)\n",
    "* [8.4 關聯式資料庫](#RelationalDatabases)\n",
    "* [8.5 NoSQL資料庫](#NoSQL)\n",
    "* [8.6 全文檢索資料庫](#Full-TextDatabases)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "<a id='IO'></a>\n",
    "## 8.1 資料的輸出與輸入\n",
    "[回目錄](#HOME)\n",
    "\n",
    "```python\n",
    "fileobj = open(filename, mode)\n",
    "```\n",
    "\n",
    "|mode 第一個字母|解釋|\n",
    "|:---:|----|\n",
    "|r |表示讀模式|\n",
    "|w |表示寫模式。如果文件不存在則新創建，如果存在則重寫新內容|\n",
    "|x |表示在文件不存在的情況下新創建並寫文件|\n",
    "|a |表示如果文件存在，在文件末尾追加寫內容|\n",
    "\n",
    "\n",
    "|mode 第二個字母|解釋|\n",
    "|:---:|----|\n",
    "|t|（或者省略）代表文本類型|\n",
    "|b| 代表二進位文件|\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 8.1.1 使用write()與print()寫入檔案\n",
    "\n",
    "\n",
    "使用print寫入時可以使用sep與end參數指定分隔符號與結束符號\n",
    "* sep，預設為' '\n",
    "* end，預設為'\\n'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "abc = 'abc'\n",
    "defg = 'defg'\n",
    "\n",
    "#wirte寫入\n",
    "fout = open('Data/relativity', 'wt')\n",
    "fout.write('{0}-{1}。'.format(abc, 'XD'))\n",
    "fout.write(defg)\n",
    "fout.close()\n",
    "\n",
    "#print寫入(預設)\n",
    "fout = open('Data/relativity', 'wt')\n",
    "print(abc, defg, file=fout)\n",
    "fout.close()\n",
    "\n",
    "##print寫入(更換分隔與結束符號)\n",
    "fout = open('Data/relativity', 'wt')\n",
    "print(abc, defg, sep='-', end='。', file=fout)\n",
    "fout.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "文件已存在。\n"
     ]
    }
   ],
   "source": [
    "#分段寫入\n",
    "poem = '''There was a young lady named Bright,\n",
    "Whose speed was far faster than light;\n",
    "She started one day\n",
    "In a relative way,\n",
    "And returned on the previous night.'''\n",
    "size = len(poem)\n",
    "\n",
    "fout = open('Data/relativity', 'wt')\n",
    "\n",
    "offset = 0\n",
    "chunk = 100\n",
    "while True:\n",
    "    if offset > size:\n",
    "        fout.close()\n",
    "        break\n",
    "    fout.write(poem[offset:offset+chunk])\n",
    "    offset += chunk\n",
    "    \n",
    "\n",
    "#open模式改為x可以防止覆蓋已存在之文件\n",
    "try:\n",
    "    fout = open('Data/relativity', 'xt')\n",
    "    fout.write('stomp stomp stomp')\n",
    "except FileExistsError:\n",
    "    print('文件已存在。')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 8.1.2使用read()，readline()或者readlines()讀取檔案\n",
    "\n",
    "* __fin.read()__，一次讀入全部，或是指定讀入字節數，注意記憶體占用情況\n",
    "* __fin.readline()__，一次讀入一行\n",
    "* __fin.readlines()__，疊代器用法，寫法更好看"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There was a young lady named Bright,\n",
      "Whose speed was far faster than light;\n",
      "She started one day\n",
      "In a relative way,\n",
      "And returned on the previous night.\n",
      "\n",
      "================\n",
      "There was a young lady named Bright,\n",
      "Whose speed was far faster than light;\n",
      "She started one day\n",
      "In a relative way,\n",
      "And returned on the previous night.\n",
      "\n",
      "================\n",
      "There was a young lady named Bright,\n",
      "Whose speed was far faster than light;\n",
      "She started one day\n",
      "In a relative way,\n",
      "And returned on the previous night.\n",
      "\n",
      "================\n",
      "共 5 行\n",
      "There was a young lady named Bright,\n",
      "。Whose speed was far faster than light;\n",
      "。She started one day\n",
      "。In a relative way,\n",
      "。And returned on the previous night.。"
     ]
    }
   ],
   "source": [
    "#一次讀入\n",
    "fin = open('Data/relativity', 'rt' )\n",
    "poem = fin.read()\n",
    "fin.close()\n",
    "print(poem)\n",
    "\n",
    "\n",
    "#指定一次100字節\n",
    "print('\\n================')\n",
    "poem = ''\n",
    "fin = open('Data/relativity', 'rt' )\n",
    "chunk = 100\n",
    "while True:\n",
    "    fragment = fin.read(chunk)\n",
    "    if not fragment:\n",
    "        fin.close()\n",
    "        break\n",
    "    poem += fragment\n",
    "\n",
    "print(poem)\n",
    "\n",
    "#使用readline一次讀入一行\n",
    "print('\\n================')\n",
    "poem = ''\n",
    "fin = open('Data/relativity', 'rt' )\n",
    "while True:\n",
    "    line = fin.readline()\n",
    "    if not line:\n",
    "        fin.close()\n",
    "        break\n",
    "    poem += line\n",
    "\n",
    "print(poem)\n",
    "\n",
    "#使用readlines疊代器\n",
    "print('\\n================')\n",
    "fin = open('Data/relativity', 'rt' )\n",
    "lines = fin.readlines()\n",
    "fin.close()\n",
    "print('共', len(lines), '行')\n",
    "\n",
    "for line in lines:\n",
    "    print(line, end='。')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 8.1.3 寫入二進位檔案"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "bdata = bytes(range(0, 256))\n",
    "#print(bdata)\n",
    "\n",
    "#一次寫入\n",
    "fout = open('Data/bfile', 'wb')\n",
    "fout.write(bdata)\n",
    "fout.close()\n",
    "\n",
    "#批次寫入\n",
    "fout = open('Data/bfile', 'wb')\n",
    "size = len(bdata)\n",
    "offset = 0\n",
    "chunk = 100\n",
    "while True:\n",
    "    if offset > size:\n",
    "        fout.close()\n",
    "        break\n",
    "    fout.write(bdata[offset:offset+chunk])\n",
    "    offset += chunk\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 8.1.4 讀取二進位檔案"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "b'\\x00\\x01\\x02\\x03\\x04\\x05\\x06\\x07\\x08\\t\\n\\x0b\\x0c\\r\\x0e\\x0f\\x10\\x11\\x12\\x13\\x14\\x15\\x16\\x17\\x18\\x19\\x1a\\x1b\\x1c\\x1d\\x1e\\x1f !\"#$%&\\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\\x7f\\x80\\x81\\x82\\x83\\x84\\x85\\x86\\x87\\x88\\x89\\x8a\\x8b\\x8c\\x8d\\x8e\\x8f\\x90\\x91\\x92\\x93\\x94\\x95\\x96\\x97\\x98\\x99\\x9a\\x9b\\x9c\\x9d\\x9e\\x9f\\xa0\\xa1\\xa2\\xa3\\xa4\\xa5\\xa6\\xa7\\xa8\\xa9\\xaa\\xab\\xac\\xad\\xae\\xaf\\xb0\\xb1\\xb2\\xb3\\xb4\\xb5\\xb6\\xb7\\xb8\\xb9\\xba\\xbb\\xbc\\xbd\\xbe\\xbf\\xc0\\xc1\\xc2\\xc3\\xc4\\xc5\\xc6\\xc7\\xc8\\xc9\\xca\\xcb\\xcc\\xcd\\xce\\xcf\\xd0\\xd1\\xd2\\xd3\\xd4\\xd5\\xd6\\xd7\\xd8\\xd9\\xda\\xdb\\xdc\\xdd\\xde\\xdf\\xe0\\xe1\\xe2\\xe3\\xe4\\xe5\\xe6\\xe7\\xe8\\xe9\\xea\\xeb\\xec\\xed\\xee\\xef\\xf0\\xf1\\xf2\\xf3\\xf4\\xf5\\xf6\\xf7\\xf8\\xf9\\xfa\\xfb\\xfc\\xfd\\xfe\\xff'\n"
     ]
    }
   ],
   "source": [
    "fin = open('Data/bfile', 'rb')\n",
    "bdata = fin.read()\n",
    "fin.close()\n",
    "\n",
    "print(bdata)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 8.1.5使用with自動關閉檔案\n",
    "\n",
    "python若後續沒有再繼續使用檔案則會自動關閉，所以在一個function中開啟檔案就算沒有關閉最後也會自動關閉。  \n",
    "但是在一個主程式中開啟則不會自動關閉，所以可以使用with as來做為自動關閉檔案用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "with open('Data/relativity', 'wt') as fout:\n",
    "    fout.write(poem)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 使用seek()改變位置\n",
    "\n",
    "__file.tell()__可以查詢讀取位置\n",
    "seek(offset,origin)\n",
    "\n",
    "origin = 0，預設，從頭開始位移。  \n",
    "origin = 1，從目前位置開始位移。  \n",
    "origin = 2，從最後往前為位移。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\n",
      "255\n",
      "1\n",
      "255\n",
      "255\n"
     ]
    }
   ],
   "source": [
    "fin = open('Data/bfile', 'rb')\n",
    "fin.seek(255)\n",
    "\n",
    "bdata = fin.read()\n",
    "print(len(bdata))\n",
    "print(bdata[0])\n",
    "fin.close()\n",
    "\n",
    "#使用不同方法讀取最後一個字節\n",
    "fin = open('Data/bfile', 'rb')\n",
    "\n",
    "fin.seek(-1, 2)\n",
    "bdata = fin.read()\n",
    "print(len(bdata))\n",
    "print(bdata[0])\n",
    "fin.close()\n",
    "\n",
    "#\n",
    "fin = open('Data/bfile', 'rb')\n",
    "fin.seek(254, 0)\n",
    "fin.tell()\n",
    "fin.seek(1, 1)\n",
    "fin.tell()\n",
    "\n",
    "bdata = fin.read()\n",
    "print(bdata[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "<a id='StructuredText'></a>\n",
    "## 8.2 結構化資料檔案\n",
    "[回目錄](#HOME)\n",
    "\n",
    "* CSV\n",
    "* XML\n",
    "* HTML\n",
    "* JSON"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[['Doctor', 'No'], [], ['Rosa', 'Klebb'], [], ['Mister', 'Big'], [], ['Auric', 'Goldfinger'], [], ['Ernst', 'Blofeld'], []]\n",
      "[{'first': 'Doctor', 'last': 'No'}, {'first': 'Rosa', 'last': 'Klebb'}, {'first': 'Mister', 'last': 'Big'}, {'first': 'Auric', 'last': 'Goldfinger'}, {'first': 'Ernst', 'last': 'Blofeld'}]\n",
      "[{'first': 'Doctor', 'last33': 'Blofeld', 'last': 'No'}, {'first': 'Rosa', 'last33': 'Blofeld', 'last': 'Klebb'}, {'first': 'Mister', 'last33': 'Blofeld', 'last': 'Big'}, {'first': 'Auric', 'last33': 'Blofeld', 'last': 'Goldfinger'}, {'first': 'Ernst', 'last33': 'Blofeld', 'last': 'Blofeld'}]\n"
     ]
    }
   ],
   "source": [
    "import csv\n",
    "villains = [\n",
    "    ['Doctor', 'No'],\n",
    "    ['Rosa', 'Klebb'],\n",
    "    ['Mister', 'Big'],\n",
    "    ['Auric', 'Goldfinger'],\n",
    "    ['Ernst', 'Blofeld']]\n",
    "\n",
    "# 寫入\n",
    "with open('Data/villains', 'wt') as fout:\n",
    "    csvout = csv.writer(fout)\n",
    "    csvout.writerows(villains)\n",
    "\n",
    "# 讀取\n",
    "with open('Data/villains', 'rt') as fin:\n",
    "    cin = csv.reader(fin)\n",
    "    villains = [row for row in cin]\n",
    "print(villains)\n",
    "\n",
    "# 讀成字典\n",
    "with open('Data/villains', 'rt') as fin:\n",
    "    cin = csv.DictReader(fin, fieldnames=['first', 'last'])\n",
    "    villains = [row for row in cin]\n",
    "print(villains)\n",
    "\n",
    "# 用字典格式寫入\n",
    "villains = [{'first': 'Doctor', 'last': 'No', 'last33': 'Blofeld'},\n",
    "    {'first': 'Rosa', 'last': 'Klebb', 'last33': 'Blofeld'},\n",
    "    {'first': 'Mister', 'last': 'Big', 'last33': 'Blofeld'},\n",
    "    {'first': 'Auric', 'last': 'Goldfinger', 'last33': 'Blofeld'},\n",
    "    {'first': 'Ernst', 'last': 'Blofeld', 'last33': 'Blofeld'}]\n",
    "\n",
    "with open('Data/villains', 'wt') as fout:\n",
    "#     cout = csv.DictWriter(fout, ['first', 'last'])\n",
    "    cout = csv.DictWriter(fout, ['first', 'last','last33'])\n",
    "    cout.writeheader()  #寫檔頭\n",
    "    cout.writerows(villains)\n",
    "    \n",
    "with open('Data/villains', 'rt') as fin:\n",
    "    cin = csv.DictReader(fin)\n",
    "    villains = [row for row in cin]\n",
    "print(villains)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "menu\n",
      "\ttag: item attributes: breakfast burritos\n",
      "\ttag: item attributes: {'price': '$6.00'}\n",
      "\ttag: item attributes: pancakes\n",
      "\ttag: item attributes: {'price': '$4.00', 'XDD': 'QQ'}\n",
      "\ttag: item attributes: hamburger\n",
      "\ttag: item attributes: {'price': '$5.00'}\n",
      "\ttag: item attributes: spaghetti\n",
      "\ttag: item attributes: {'price': '8.00'}\n",
      "3\n",
      "2\n"
     ]
    }
   ],
   "source": [
    "import xml.etree.ElementTree as et\n",
    "tree = et.ElementTree(file='Data/menu.xml')\n",
    "root = tree.getroot()\n",
    "print(root.tag)\n",
    "\n",
    "for child in root:\n",
    "    #print('tag:', child.tag, 'attributes:', child.attrib)\n",
    "    for grandchild in child:\n",
    "        print('\\ttag:', grandchild.tag, 'attributes:', grandchild.text)\n",
    "        print('\\ttag:', grandchild.tag, 'attributes:', grandchild.attrib)\n",
    "        \n",
    "        \n",
    "print(len(root))\n",
    "print(len(root[0]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'str'>\n",
      "{'lunch': {'items': {'hamburger': '$5.00'}, 'hours': '11-3'}, 'breakfast': {'items': {'breakfast burritos': '$6.00', 'pancakes': '$4.00'}, 'hours': '7-11'}, 'dinner': {'items': {'spaghetti': '$8.00'}, 'hours': '3-10'}}\n",
      "{'items': {'breakfast burritos': '$6.00', 'pancakes': '$4.00'}, 'hours': '7-11'}\n"
     ]
    }
   ],
   "source": [
    "menu = \\\n",
    "{\n",
    "    \"breakfast\": {\n",
    "        \"hours\": \"7-11\",\n",
    "        \"items\": {\n",
    "            \"breakfast burritos\": \"$6.00\",\n",
    "            \"pancakes\": \"$4.00\"\n",
    "        }\n",
    "    },\n",
    "    \"lunch\" : {\n",
    "        \"hours\": \"11-3\",\n",
    "        \"items\": {\n",
    "            \"hamburger\": \"$5.00\"\n",
    "        }\n",
    "    },\n",
    "    \"dinner\": {\n",
    "        \"hours\": \"3-10\",\n",
    "        \"items\": {\n",
    "            \"spaghetti\": \"$8.00\"\n",
    "        }\n",
    "    }\n",
    "}\n",
    "\n",
    "import json\n",
    "menu_json = json.dumps(menu)\n",
    "print(type(menu_json))\n",
    "# print(menu_json['breakfast'])\n",
    "\n",
    "menu2 = json.loads(menu_json)\n",
    "print(menu2)\n",
    "print(menu2['breakfast'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "<a id='StructuredBinary'></a>\n",
    "## 8.3 結構化二進位檔案\n",
    "[回目錄](#HOME)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "<a id='RelationalDatabases'></a>\n",
    "## 8.4 關聯式資料庫 SQL\n",
    "[回目錄](#HOME)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "<a id='NoSQL'></a>\n",
    "## 8.5 NoSQL資料庫\n",
    "[回目錄](#HOME)\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "<a id='Full-TextDatabases'></a>\n",
    "## 8.6 全文檢索資料庫\n",
    "[回目錄](#HOME)\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [Root]",
   "language": "python",
   "name": "Python [Root]"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
